23 research outputs found

    Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer

    Get PDF
    An important goal of research in Deep Reinforcement Learning in mobile robotics is to train agents capable of solving complex tasks, which require a high level of scene understanding and reasoning from an egocentric perspective. When trained from simulations, optimal environments should satisfy a currently unobtainable combination of high-fidelity photographic observations, massive amounts of different environment configurations and fast simulation speeds. In this paper we argue that research on training agents capable of complex reasoning can be simplified by decoupling from the requirement of high fidelity photographic observations. We present a suite of tasks requiring complex reasoning and exploration in continuous, partially observable 3D environments. The objective is to provide challenging scenarios and a robust baseline agent architecture that can be trained on mid-range consumer hardware in under 24h. Our scenarios combine two key advantages: (i) they are based on a simple but highly efficient 3D environment (ViZDoom) which allows high speed simulation (12000fps); (ii) the scenarios provide the user with a range of difficulty settings, in order to identify the limitations of current state of the art algorithms and network architectures. We aim to increase accessibility to the field of Deep-RL by providing baselines for challenging scenarios where new ideas can be iterated on quickly. We argue that the community should be able to address challenging problems in reasoning of mobile agents without the need for a large compute infrastructure

    Learning to plan with uncertain topological maps

    Get PDF
    We train an agent to navigate in 3D environments using a hierarchical strategy including a high-level graph based planner and a local policy. Our main contribution is a data driven learning based approach for planning under uncertainty in topological maps, requiring an estimate of shortest paths in valued graphs with a probabilistic structure. Whereas classical symbolic algorithms achieve optimal results on noise-less topologies, or optimal results in a probabilistic sense on graphs with probabilistic structure, we aim to show that machine learning can overcome missing information in the graph by taking into account rich high-dimensional node features, for instance visual information available at each location of the map. Compared to purely learned neural white box algorithms, we structure our neural model with an inductive bias for dynamic programming based shortest path algorithms, and we show that a particular parameterization of our neural model corresponds to the Bellman-Ford algorithm. By performing an empirical analysis of our method in simulated photo-realistic 3D environments, we demonstrate that the inclusion of visual features in the learned neural planner outperforms classical symbolic solutions for graph based planning.Comment: ECCV 202

    Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer

    Get PDF
    International audienceAn important goal of research in Deep Reinforcement Learning in mobile robotics is to train agents capable of solving complex tasks, which require a high level of scene understanding and reasoning from an egocentric perspective. When trained from simulations, optimal environments should satisfy a currently unobtainable combination of high-fidelity photographic observations, massive amounts of different environment configurations and fast simulation speeds. In this paper we argue that research on training agents capable of complex reasoning can be simplified by decoupling from the requirement of high fidelity photographic observations. We present a suite of tasks requiring complex reasoning and exploration in continuous, partially observable 3D environments. The objective is to provide challenging scenarios and a robust baseline agent architecture that can be trained on mid-range consumer hardware in under 24h. Our scenarios combine two key advantages: (i) they are based on a simple but highly efficient 3D environment (ViZ-Doom) which allows high speed simulation (12000fps); (ii) the scenarios provide the user with a range of difficulty settings, in order to identify the limitations of current state of the art algorithms and network ar-chitectures. We aim to increase accessibility to the field of Deep-RL by providing baselines for challenging scenarios where new ideas can be iterated on quickly. We argue that the community should be able to address challenging problems in reasoning of mobile agents without the need for a large compute infrastructure. Code for the generation of scenarios and training of baselines is available online at the following repository 1

    Learning to plan with uncertain topological maps

    Get PDF
    International audienceWe train an agent to navigate in 3D environments using a hierarchical strategy including a high-level graph based planner and a local policy. Our main contribution is a data driven learning based approach for planning under uncertainty in topological maps, requiring an estimate of shortest paths in valued graphs with a probabilistic structure. Whereas classical symbolic algorithms achieve optimal results on noiseless topologies, or optimal results in a probabilistic sense on graphs with probabilistic structure, we aim to show that machine learning can overcome missing information in the graph by taking into account rich high-dimensional node features, for instance visual information available at each location of the map. Compared to purely learned neural white box algorithms, we structure our neural model with an inductive bias for dynamic programming based shortest path algorithms, and we show that a particular parameterization of our neural model corresponds to the Bellman-Ford algorithm. By performing an empirical analysis of our method in simulated photo-realistic 3D environments, we demonstrate that the inclusion of visual features in the learned neural planner outperforms classical symbolic solutions for graph based planning

    EgoMap: Projective mapping and structured egocentric memory for Deep RL

    Full text link
    Tasks involving localization, memorization and planning in partially observable 3D environments are an ongoing challenge in Deep Reinforcement Learning. We present EgoMap, a spatially structured neural memory architecture. EgoMap augments a deep reinforcement learning agent's performance in 3D environments on challenging tasks with multi-step objectives. The EgoMap architecture incorporates several inductive biases including a differentiable inverse projection of CNN feature vectors onto a top-down spatially structured map. The map is updated with ego-motion measurements through a differentiable affine transform. We show this architecture outperforms both standard recurrent agents and state of the art agents with structured memory. We demonstrate that incorporating these inductive biases into an agent's architecture allows for stable training with reward alone, circumventing the expense of acquiring and labelling expert trajectories. A detailed ablation study demonstrates the impact of key aspects of the architecture and through extensive qualitative analysis, we show how the agent exploits its structured internal memory to achieve higher performance

    Graph augmented Deep Reinforcement Learning in the GameRLand3D environment

    Get PDF
    We address planning and navigation in challenging 3D video games featuring maps with disconnected regions reachable by agents using special actions. In this setting, classical symbolic planners are not applicable or difficult to adapt. We introduce a hybrid technique combining a low level policy trained with reinforcement learning and a graph based high level classical planner. In addition to providing human-interpretable paths, the approach improves the generalization performance of an end-to-end approach in unseen maps, where it achieves a 20% absolute increase in success rate over a recurrent end-to-end agent on a point to point navigation task in yet unseen large-scale maps of size 1kmĂ—1km. In an indepth experimental study, we quantify the limitations of end-to-end Deep RL approaches in vast environments and we also introduce "GameRLand3D", a new benchmark and soon to be released environment can generate complex procedural 3D maps for navigation tasks

    Graph augmented Deep Reinforcement Learning in the GameRLand3D environment

    Get PDF
    We address planning and navigation in challenging 3D video games featuring maps with disconnected regions reachable by agents using special actions. In this setting, classical symbolic planners are not applicable or difficult to adapt. We introduce a hybrid technique combining a low level policy trained with reinforcement learning and a graph based high level classical planner. In addition to providing human-interpretable paths, the approach improves the generalization performance of an end-to-end approach in unseen maps, where it achieves a 20% absolute increase in success rate over a recurrent end-to-end agent on a point to point navigation task in yet unseen large-scale maps of size 1kmĂ—1km. In an indepth experimental study, we quantify the limitations of end-to-end Deep RL approaches in vast environments and we also introduce "GameRLand3D", a new benchmark and soon to be released environment can generate complex procedural 3D maps for navigation tasks

    Zephyr: Direct Distillation of LM Alignment

    Full text link
    We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-access RLHF-based model. Code, models, data, and tutorials for the system are available at https://github.com/huggingface/alignment-handbook

    The British Army, information management and the First World War revolution in military affairs

    Get PDF
    Information Management (IM) – the systematic ordering, processing and channelling of information within organisations – forms a critical component of modern military command and control systems. As a subject of scholarly enquiry, however, the history of military IM has been relatively poorly served. Employing new and under-utilised archival sources, this article takes the British Expeditionary Force (BEF) of the First World War as its case study and assesses the extent to which its IM system contributed to the emergence of the modern battlefield in 1918. It argues that the demands of fighting a modern war resulted in a general, but not universal, improvement in the BEF’s IM techniques, which in turn laid the groundwork, albeit in embryonic form, for the IM systems of modern armies. KEY WORDS: British Army, Information Management, First World War, Revolution in Military Affairs, Adaptatio

    Apprentissage automatique à grande échelle du comportement des agents autonomes avec apprentissage structuré par renforcement profond

    No full text
    Autonomous robotic agents have begun to impact many aspects of our society, with application in automated logistics, autonomous hospital porters, manufacturing and household assistants. The objective of this thesis is to explore Deep Reinforcement Learning approaches to planning and navigation in large and unknown 3D environments. In particular, we focus on tasks that require exploration and memory in simulated environments. An additional requirement is that learned policies should generalize to unseen map instances. Our long-term objective is the transfer of a learned autonomous robotic agents have begun to impact many aspects of our society, with application in automated logistics, autonomous hospital porters, manufacturing and household assistants. The objective of this thesis is to explore Deep Reinforcement Learning approaches to planning and navigation in large and unknown 3D environments. In particular, we focus on tasks that require exploration and memory in simulated environments. Our long-term objective is the transfer of a learned objective of accumulating a task-based reward, an Embodied AI agent must learn to discover relevant semantic cues such as object recognition and obstacle avoidance, if these skills are pertinent to the task at hand. This thesis introduces the field of Structured Deep Reinforcement Learning and then describes 5 contributions that were published during the PhD. We start by creating a set of challenging memory-based tasks whose performance is benchmarked with an unstructured memory-based agent. We then demonstrate how the incorporation of structure in the form of a learned metric map, differentiable inverse projective geometry and self-attention mechanisms; augments the unstructured agent, improving its performance and allowing us to interpret the agent’s reasoning process. We then move from complex tasks in visually simple environments, to more challenging environments with photo-realistic observations, extracted from scans of real-world buildings. In this work we demonstrate that augmenting such an agent with a topological map can improve its navigation performance. We achieve this by learning a neural approximation of a classical path planning algorithm, which can be utilized on graphs with uncertain connectivity. From work undertaken over the course of a 4-month internship at the R & D department of Ubisoft, we demonstrate that structured methods can also be used for navigation and planning in challenging video game environments. Where we couple a lower level neural policy with a classical planning algorithm to improve long-distance planning and navigation performance in vast environments of 1km×1km. Finally, we develop an open-source Deep Reinforcement Learning interface for the Godot Game Engine. Allowing for the construction of complex virtual worlds and the learning of agent behaviors with a suite of state-of-the-art algorithms.Les robots autonomes ont commencé à impacter de nombreux aspects de notre société avec, par exemple des applications dans la logistique automatisée, les robots hospitaliers autonomes, l’industrie ou encore les aides ménagères. L’objectif de cette thèse est d’explorer les approches d’apprentissage par renforcement profond pour la planification et la navigation dans des environnements 3D vastes et inconnus. Nous nous concentrons en particulier sur les tâches qui nécessitent d’explorer et mémoriser les environnements simulés. Une contrainte supplémentaire est que les stratégies apprises doivent se généraliser à des cartes inconnues. Notre objectif à long terme est le transfert d’une technique d’apprentissage vers un système robotique dans le monde réel. Les algorithmes d’apprentissage par renforcement apprennent des interactions. En agissant avec l’objectif d’accumuler des récompenses liées à une tâche, une IA incarnée doit apprendre à découvrir des informations sémantiques telles que la reconnaissance d’objets et l’évitement d’obstacles, si ces compétences sont pertinentes pour l’accomplissement de la tâche. Cette thèse introduit le domaine de l’Apprentissage par Renforcement Profond Structuré et décrit ensuite cinq contributions qui ont été publiées au cours de la thèse. Nous commençons par créer un ensemble de tâches complexes nécessitant de la mémoire pour comparer les performances avec un agent à la mémoire non structurée. Nous démontrons ensuite comment l’incorporation d’une structure telle qu’une carte métrique apprise, une géométrie projective inverse différentiable et des mécanismes d’autoattention améliorent les performances de l’agent, ce qui nous permet d’analyser son processus de raisonnement. Nous passons ensuite d’environnements visuellement simples à des environnements plus difficiles avec des observations photoréalistes extraites de scans de bâtiments du monde réel. Dans ce travail, nous démontrons qu’améliorer un agent avec une carte topologique peut améliorer ses performances de navigation. Nous y parvenons en lui apprenant une approximation neuronale d’un algorithme de planification de chemin classique, qui peut être utilisé sur des graphes avec une connectivité incertaine. Ensuite, à partir des travaux menés lors d’un stage de quatre mois au sein du département recherche et développement d’Ubisoft, nous démontrons que les méthodes structurées peuvent également être utilisées pour la navigation et la planification dans des environnements de jeux vidéo complexes. Nous combinons une politique neuronale de bas niveau avec un algorithme de planification classique pour améliorer la planification à longue distance et les performances de navigation dans de vastes environnements de 1km×1km
    corecore